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Abstract 

We show one possible dynamical approach to the study of the distribu- 
tion of prime numbers. Our approach is based on two complexity methods, 
the Computable Information Content and the Entropy Information Gain, 
looking for analogies between the prime numbers and intermittency. 



1 Introduction 

Nowadays one of the most famous open problem in mathematics is the study 
of the distribution of prime numbers. This difficult problem has been studied 
in the last two centuries by many mathematicians, using many different tech- 
niques. More recently also in the physical literature it is possible to find many 
papers devoted to prime numbers, with two different points of view: a numerical 
approach using methods developed for physical systems; a theoretical approach, 
where prime numbers are found in some properties of physical systems. 

This paper can be considered as a tentative to mix together the mathematical 
and the physical approach. We analyze prime numbers using numerical methods 
developed to study complexity in time series, and look for interpretations of the 
results in the realm of dynamical systems, in particular intermittent dynamical 
systems. 

We first explain is SectionElthe complexity methods we use. Then in Section 
13 we start to build a dynamical approach to prime numbers, looking for a 
dynamical model, and testing it by the complexity methods. It is interesting 
that the two methods seem to show some differences, sign of how difficult is to 
give a universal notion of complexity. 

Finally in Section^ we turn back to mathematics, to show that the prime 
numbers seem to have different statistical properties with respect to our dy- 
namical models. Hence both the methods, that showed these discrepancies in 
different ways, seem to be able, if used together, to make a deep analysis of time 
series. 
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2 Complexity methods for time series 



There are many methods used in hterature to study the "complexity" of a time 
series. In this paper we used two methods that foUow from the interpretation of 
a time series as an information source: Computable Information Content (PP,|2]) 
and Diffusion Entropy Both methods aim to study the complexity of 

an underlying dynamics that produce the time series. The two methods have 
been studied also in jSj. 



2.1 Computable Information Content 

Let {X, T, /i) be an invariant dynamical system, where X is a compact metric 
space (the phase space) with the Borel cr-algebra, T : X ^ X is a measurable 
transformation of X, and /i is a T-invariant measure on X, that is ^{T~^{A)) = 
n{A) for all measurable sets A C X (where T~-^{A) is the counter-image of the 
set A). 

One of the classical indicators of chaotic behavior of a dynamical system is 
given by the Kolmogorov-Sinai (KS) entropy hfj_{X,T), defined for probability 
T-invariant measures /i on X (|ni,|Z|). Chaitin and Kolmogorov independently 
introduced another method based on information theory to study chaotic sys- 
tems, the Algorithmic Information Content (AIC) (see |S] and The two 
indicators of chaos arc shown to coincide. 

To study the information produced by a dynamical system, let Z he a finite 
partition of the phase space X into measurable sets {/i, . . . ,1^}- To any point 
X £ X we can associate an infinite string u) = (wq, Wi, . . . , a;„, . . . ), where for all 
n e N it holds cun & A = {1, . . . , N} and T"'(a;) e The set A is called the 
alphabet associated to the partition Z , and the set of such strings lu is denoted 
i>y ^1 A^ ■ We call the map ipz : X ^ fl the symbolic representation of {X, T) 
relative to Z. The space $7 is a compact metric space, with metric given by 



d{u;,u;')=J2 



1 - 6{uJn,Uj'J 



2^ 

where S{i,j) — ii i j and S{i,j) — 1 ii i — j. The Borel cr-algebra on fl is 
generated by the so-called cylinders c!^'^\ defined for all integers k < h and 
a; G i7 by 

cik.h) ^{^' en / Lu'„ = LUnyn = k,...,h} 

Finally let a : Q ^ CI he the continuous transformation of Q given by (cr(ci;))„ = 
ujn+i for all n G N. The map a is called the shift map. Any T-invariant 
measure n on X induces by the symbolic representation a cr-invariant measure 
u := {ipz)*l^ on Q. Then (fi, cr, v) is a dynamical system, called the symbolic 
dynamical system. 

When the symbolic representation tpz is injective then the dynamical sys- 
tems {X, T, /i) and {fl, a, ly) are topologically conjugate. In this case the en- 
tropies hf^{X,T, Z) and h^{fl,a) coincide, for any probability measures fj, and 
V = {(pz)*fJ-, respectively T and cr-invariant. Here /ip(X, T, Z) is the KS entropy 
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of the dynamical system (X, T, /x) with respect to the partition Z. It holds 
T) = sup T, Z) / Z is a finite partition of X} and the supremum 

is attained on generating partitions. 

The method of symbolic dynamics tells us that to study the information 
produced by the dynamical system {X,T,ii), we can look at the different sym- 
bolic representations of the system, and in particular to those corresponding to 
generating partitions. Then we need to study the information in some sense 
contained in strings. One way is given by the Algorithmic Information Content 
(AIC). 

Let s be a finite string on an alphabet A, and let C be a universal Turing 
machine, then if p denotes a program given to C as input, we define 

AIC{s) = min {\p\ / C{p) = s} (1) 

where \p\ is the binary length of the program p. In words, the AIC of a finite 
string s is the binary length of the smallest program p that gives s as output 
when run on a universal Turing machine. 

The idea underlying the definition of the KS entropy is to estimate the rate 
with which the information necessary to reconstruct an orbit of the dynamical 
system increases with time. So, analogously, we study the rate with which the 
information contained in an infinite string increases with the length of the string. 
Formally, let w e f2 be an infinite string, then the complexity K{u}) is defined 
as 

j^H^limsup -^-^^(^") (2) 

n — >oc n 

where = (wq, . . . , is the n-long string made by the first n symbols of 

the infinite string uj. 

At this point it is immediate, by the method of symbolic representation of a 
dynamical system, to introduce a notion of complexity for orbits of a dynamical 
system using the AIC. Let indeed (X, T, /i) be an invariant dynamical system, 
and let Z be a finite measurable partition of X. Then via the map (pz to each 
point X G X we associate an infinite string u> G C A^, where A is the finite 
alphabet associated to Z. Then the complexity K{x, Z) of a point x with respect 
to the partition Z is given by 

K{x, Z) := K{^^z{x)) = limsup ^^^^'^'""'^^ (3) 

where AIC{x, n, Z) := AIC{ipz{xy'). 

The complexity of a point x and the KS entropy of a dynamical system 
are related by the following result (see ^01,12]): let (X, T, /i) be an invariant 
dynamical system, and let /i be a probability measure, then 

/ K{x,Z) dn = hf,{X,T,Z) (4) 
J X 

for any finite measurable partition Z oi X. In particular if the probability 
measure /i is also ergodic then 

K{x,Z)^h^{X,T,Z) (5) 
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for /x-almost any x X. To obtain the KS entropy of a system, one can choose 
a generating partition Z and /i^ (X, T, Z) = /i^ (X, T) . If the invariant ergodic 
measure n is infinite then (see ^] for exact hypothesis on systems with infinite 
measures) 

K{x,Z)^0 (6) 

for /i- almost any x € X. 

This result on complexity is not particular relevant for the case of probability 
invariant measures when T) = 0, but still the system is chaotic (these 

systems are called weakly chaotic), or for infinite invariant measures. In these 
cases what one can do is to look at the rate of increase of the AIC for the orbits 
of the system. Indeed, if hf^{X,T, Z) > then, for a finite ergodic system, it 
holds AIC{x, n, Z) ~ h^{X, T, Z)n for /z-almost any x, but for ergodic weakly 
chaotic systems and for ergodic infinite systems it holds AIC{x,n, Z) = o{n). 
Hence what one can do to classify the complexity of the system is to study 
the asymptotic behavior of the AIC. This has been done for some examples of 
weakly chaotic and infinite systems, in [2] the results are collected. 

The problem with this approach is that the Algorithmic Information Content 
is a function that is not computable, in the sense that does not exist an algorithm 
able to compute the AIC for any finite string. This result is related to the Turing 
machine halting problem. Hence what one can do in practice is to study an 
algorithm which approximates the AIC. We used compression algorithms, that 
are formally defined in [2]. Given a finite string s ^ A* , the set of all finite words 
in the alphabet A, a compression algorithm takes it as input and gives as output 
a finite string s' G {0, 1}*, that is in some sense a string which contains all the 
information necessary to reconstruct s. We use as approximation oi AIC{s) the 
length of the compressed string s' . We call it Computable Information Content 
(CIC) of s. In the following we use a particular compression algorithm called 
CASToRe (introduced in that has been tested on some weakly chaotic and 
infinite systems, giving good approximations of the AIC. So we have CIC{s) = 
\CASToRe{s)\, where | • | denotes the length of a string. 

2.2 Diffusion Entropy (DE) 

The main idea of this method is to investigate the time series complexity building 
up a diffusion process, without any kind of pre-processing of the series. This 
technique is based principally on the Continuous time random walk (CTRW) 
and on the Generalized Central Limit Thcorcm(GCLT). It is very sensitive to 
time randomness, when the deviation from Poisson statistics generates Levy 
diffusion, and to the trend of a non-stationary series, as in prime numbers. Let 
^i, with i = 1, .,M, be a sequence of M numbers. The purpose of the DE is 
to establish the possible existence of a scaling, either normal or anomalous, in 
the most efficient way as possible, without altering the data with any form of 
detrending. Let I be a integer number, fitting the condition 1 < I < M . This 
integer number will be referred as time. For any given /, there are M — I + 1 



4 



sub-sequence defined by 



^l'^=^i+s s = 0,...,M-l (7) 

For any of this sub-sequence, a diffusion trajectory is built up, labeled with 
the index s, defined by 

i=l 

This position can be imagined as referring to a Brownian particle that, at 
regular intervals of time, has been jumping forward or backward, according 
to the prescription of the corresponding sub-scqucncc. This means that the 
particle, before reaching the position that it holds at time I, has been making I 
jumps. The jump made at the i-th step has the intensity | and is forward 

(s) 

or backward according to whether the number Q is positive or negative. Now 
the entropy of this diffusion process can be evaluated. To do that, the x-axis 
is partitioned into cells of size e{l). When this partition is made, the cells are 
labeled. The number of how many particles are found in the same cell at a given 
time I is denoted by N{1). This number is used to determine the probability 
that a particle can be found in the i-th cell at time I, Pi{l), by means of 

At this stage, the entropy of the diffusion process at time / is determined, and 
reads 

Seil){l) = -Yl Piil)^^Ml)] (10) 
i 

The easiest way to proceed with the choice of the cell size, e(Z), is to assume 
it independent of I and determine it by a suitable fraction of the square root of 
the variance of the fluctuation in the data ^j. A little comment on the way used 
to define the trajectories: the method is based on the idea of a moving window 
of size / that makes the s-th trajectory closely correlated to the next. The 
two trajectories have / — 1 values in common. A motivation of this choice is a 
possible connection with the Kolmogorov-Sinai (KS) entropy. The KS entropy 
of a symbolic sequence is evaluated by moving a window of size / along the 
sequence. Any window position corresponds to a given combination of symbols, 
and from the frequency of each combination, it is possible to derive the Shannon 
entropy S{1). The KS entropy in given by the limit limj^oo S{l)/l. Hence the 
idea is that the same sequence analyzed with the DE method, should yield 
at large values of I a well-defined scaling 5. As a simplifying assumption let 
consider large values of times to make the continuous assiimption valid. In this 
case, the trajectories built up with the above illustrated procedure correspond 
to the following equation of motion: 
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where ^(t) denotes the value of the time series under study at time t. This means 
that the function ^(l) is depicted as a function of t, thought of as a continuous 
time t = I. In this case, the Shannon entropy reads 

/oo 
p{x,t)\og[p{x,t)]dx (12) 
-oo 

being p{x^ t) the probability distribution function of the x variable at time t. 

It has to be pointed out that scaling is a property implying that the proba- 
bility distribution function can be read as 

(J) (13) 

Plugging equation (|12(l into equation Ijl^ll . and after a little bit of algebra, S{t) 
reads 

S{x,t)^A^5\og{t) (14) 

where 

/oo 
F{v) \og[F{v)]dy (15) 
-oo 

It is now evident that this kind of techniques to detect scaling does not imply 
any form of detrending. This technique only works correctly in the case of a 
stationary time series, on the contrary, as can be analytically and numerically 
proved, if there is a trend we find a scaling with 5 = 1, as in the case of the 
prime numbers series. Whereas if the data are the output of a process of 
independent and identically distributed random variables, then by the Central 
Limit Theorem the probability distribution function p{x, t) converges to the 
Gauss distribution as i — > oo, hence we find a scaling parameter 5 — 0.5. 

On the same idea of not pre-processing the real data, a technique of condi- 
tional entropy, called Entropy Information Gain (EIG) |12) can be used, joined 
with the DE method, to detect the residual entropy associated to the inde- 
pendent fluctuations overlapped to the original trend. Let be the original 
data and Xi the time series produced using the hypothetical trend, then the 
corresponding trajectories are : 

^^^HO = E#^ and y(^)(0 = ^x!^) (16) 

1=1 i=l 

where we followed the same notations of equations Q and (|HJl . At this point we 
can repeat the same argument that we did for the DE to use continuous time, 
and define the EIG of the original data with respect to the hypothetical trend 
as the function /(x, y, t) given by 

/•+00 p + oo 

I{x,y,t) = - p{y,t)dy p{x\y,t) log p{x\y,t)dx (17) 



where p(x, t) and p{y, t) are the probability distribution functions associated 
to the trajectories x and y at time t, and p{x\y,t) = p{{x,y),t)/p{y,t), being 
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p{{x, y), t) the joint probability distribution function. After some simple algebra 
we find (see equation idJ) 



I{x,y,t) = S{{x,y),t) - S{y,t) 



(18) 



that is an additive relation. Then it is evident that the better the hypotheti- 
cal trend approximates the real time series, the smaller is the EIG. Moreover, 
applying the method of DE to the EIG, looking for scaling properties for the 
function /(x, y, t), we interpret a, S = 0.5 as a result of independent fluctuations 
in the original data, overlapped to the hypothetical trend. 

3 Intermittency and the prime numbers 

We now apply our methods to two different time series, studying the differences 
and the analogies. We first study the time series coming from an intermittent 
dynamical system, the Manneville map. The Manneville map is a dynamical 
system defined on the interval / = [0, 1] by 



where z is a real parameter. This map has the feature to have a fixed point in 
a; = which is not hyperbolic, indeed T!,{0) — 1 for all z > 1. This makes the 
system interesting study, in particular for values of the parameter z > 2. Indeed 
in that range the invariant measure that is absolutely continuous with respect 
to the Lebesgue measure is infinite. For further properties we refer to [5].|13|. 

This map was originally introduced in jT3) as a simple dynamical model 
for turbulence. Indeed the presence of the non-hyperbolic fixed point at the 
origin generates a phenomenon known as intermittency: an alternation of long 
laminar phases and short chaotic bursts. This intermittency is generated by the 
fact that when a point goes close to the origin it takes a long time to "escape" 
from it (the laminar phases) , then it has a short period of time when it remains 
far from the origin (the chaotic burst), but then it falls again near the origin, 
and this process repeats on and on. 

The behavior of the AIC for orbits of the Manneville map is studied in ^31 ■ 
It is proved that. 



where E[-] denotes the mean with respect to the Lebesgue measure and it has 
been chosen the partition Z — {[0,i), [i, 1]}, being x the real number in (0, 1) 
such that a; -f = 1. The same behavior is obtained for the Computable 
Information Content %ClC{x,n,Z)\ using CASToRe (|2). 

The second time series we study is obtained from the series of prime numbers. 
Let {1, 2, 3, . . . } be the integer numbers, we associate to each number the symbol 



Tz{x) 



X + x^ (mod 1) 



z > 1 



(19) 




(20) 
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"1" if it is a prime number, and the symbol "0" if it is not. So we construct 
an infinite string u) S {0, 1}^, whose first symbols are w = (0, 1, 1, 0, 1, 0, 1, . . . ). 
The most famous question on prime numbers is "how many prime numbers are 
there in the first n integers?" . This question becomes in our formulation "how 
many "l"s are there in the first n symbols of luV . We refer to for the formal 
approach. The first answer to this question was given by Gauss in 1849, and 
his result is contained in the 

Prime Numbers Theorem. If 7r(n) is the number of primes up to the first n 
integers, it holds 



We now study what happens if we want to study the information content of 
the string oj. Since there are many algorithms to generate the prime numbers 
(think for example to Eratosthenes' sieve), if we want to estimate AIC{uj'^) it is 
clear that it is of the order logn, since the algorithm that generates the primes 
only needs to know at what integer n it has to stop (this is done with log n bits 
of information). So the shortest program has length of the order logn. But we 
imagine to forget the origin of the string Q and look for its information content. 

In |13| it is shown that the compression of a binary string done simply 
remembering how many "0"s there are between two consecutive "l"s, is the 
"best" compression for strings of the Manneville map ( "best" in the sense that 
it approximates the behavior of the AIC up to a constant). This fact is due to the 
property of strings from the Manneville map that there are long sequences made 
only of "0"s (correspondent to the laminar phases) and short sequences with 
chaotic alternation of "0"s and "l"s (the chaotic bursts), and to the property 
that the passage through a chaotic burst can be considered as a phenomenon 
deleting the memory of the system, so what happens after a symbol "1" does not 
depend on what happened before. The string Q of prime numbers has the same 
property of long sequences of "0" and rare appearances of "l"s, so we could try 
to apply this kind of compression to Q to see what happens, even if the property 
of loss of memory is false for prime numbers. It is not difficult to see that the 
length of the new string obtained is related to the number of symbols "1" in the 
original string. So it is related to the function 7r(n). This argument shows that 
we could imagine the string Q as given from one of the orbits of the Manneville 
map with 2 = 2, both having the same asymptotic behavior of the information 
content (see equation if^ ). 

We remark that this approach to study prime numbers is similar to the 
probabilistic approach introduced by Cramer in |16| , that is we assume that the 
string Q is one of a family of strings, on which there is a probability measure /i. If 
then a property holds for /i-almost all strings of the family, then there is a "good" 
chance that it holds also for uj. Of course it is not formally true, the string u) 
could be in the set of null /i-measure for which the property does not hold, but 
certainly it is a good first guess for the properties of uo. Our aim is to look 
for a dynamical system (X, T) whose orbits have (in some probabilistic sense) 
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the same statistical properties of a;, so that we can think of <D as the symbolic 
representation of an orbit of the system, and guess some new properties of w 
from those of the system. 

Following the idea explained above, we compare the statistical properties of 
strings generated by the Manneville map T2 and the infinite string uj. We apply 
the methods introduced in Section |21 to study the complexity of the time series. 

We first apply the compression algorithm CASToRe. We built the first 
10^ symbols of the string oj and 10 orbits of the Manneville map T2 of the same 
length, corresponding to different initial conditions (this because we remark that 
for the AIC of T2 we only have results in mean with respect to the Lebesgue 
measure). In Figure ^ it is shown the behavior of the information content for 
all the series. 

1e+07 r . . . . . . . 




1 I ■ ^^-1 ■ ^^-1 ■ ^^-1 ■ ^^-1 ■ ^^-1 ■ ^^-1 ■ 

1 10 100 1000 10000 100000 1e+06 1e+07 



Figure 1: The information content measured using CASToRe for the series of 
the prime numbers and ten different orbits of the Manneville map T2 . The scale 
is bi-logarithmic. 

The graph is in bi-logarithmic scale, hence the behavior CIC{n) 
becomes a straight line with angular coefhcient 1, perturbed by a logarithm, that 
is smaller than 7 log 10. In this graph it is difficult to distinguish the different 
curves, but there is one that is practically straight, the one of uj. Of course 
the algorithm CASToRe is only an approximation of the AIC of our series, 
hence we cannot expect the curves to be exactly of the order j^^^, but they 
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are approximations to it. However our aim is not to estimate the information 
content of the series, but simply to make a comparison between their behaviors, 
hence we can neglect the effects due to the use of the particular compression 
algorithm since they are the same on each series. 

From this first figure we could then conclude that the Manneville map T2 
is a "good" dynamical system to generate the series of the prime numbers, in 
the sense explained above. However if we look at the same graphs in bi-linear 
scale (Figure , we notice a small difference between the straight line (the 
information content of u)) and the other curves. 

3e+06 I 1 1 , 1 1 1 1 1 1 1 




Figure 2: The same as in Figure^ but in bi-linear scale. 



This small difference is not visible in the bi-logarithmic figure, since the 
bi- logarithmic scale reduces the differences at high values of n. Hence the in- 
formation content of the string oj seems to be asymptotically slightly different 
from the mean information content of orbits of T2. 

This difference can be explained in terms of the approximation done in the 
Prime Numbers Theorem. Indeed the function Li(n) can be estimated using the 
method of integration by parts, obtaining the approximation 

n n ,„ f n \ 

Li(n) - + — ^ + O — ^ 

log n log n \ log n J 

and this shows that the number of primes in the first n integers (that is the 
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number of symbols "1" in the first n symbols of the string oj) grows slightly 
faster than . 

Then we apply the Entropy Information Gain method to the same series. 
First of all we study how good is the approximation of the function 7r(n) given in 
the Prime Numbers Theorem. We build up the different trajectories to analyze 
just by the sum of the number of primes contained in the strings of length I. We 
use for the hypothetical trend the first approximation of the Prime Numbers 
Theorem, 7r(n) ~ ■ Figure |31 it is shown the behavior of the residual 
Shannon entropy, that follows a straight line with angular coefficient « 0.4. 




ln(n) 



Figure 3: EIG for the prime numbers series, using as trend the first approxima- 
tion 0/ the function Li{n). 



This value of the S, close to the Gaussian value 0.5, seems to suggest that all 
the information of the primes series is contained in the Prime Number Theorem. 
To better investigate about the primes trend, we have also calculated the EIG 
with more terms approximating the function Li(n), Figure^ but the results 
doesn't change. This equality could be a limit for the sensibility of this method, 
that seems to feel only the first approximation of a hypothetical trend. 



At this point we have calculated the EIG using as data the prime numbers 
series and as hypothetical trend the series of orbits produced by the Manneville 
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Figure 4: The same as in Fig\^ but with the first two orders of approximation 
for Li{n). 

Map for z — 2, Figure |31 Also in this case, the angular coefRcient of the 
asymptotic straight line is « 0.5, due to the fact that in first approximation, this 
Manneville map contains the same information of the Prime Numbers Theorem, 
even if the compression method has shown some differences, that if coming from 
the approximation of the function Li(n) are stationary, so cannot be interpreted 
as independent fluctuations on the trend. 

So far we have applied theoretical results from number theory and from the 
theory of information content in intermittent dynamical systems, obtaining a 
"close" relationship between the prime number series Co and the orbits of the 
Manneville map T2. To find a better dynamical model for the prime numbers, 
in the sense explained above, we could slightly perturb the map T2 and see what 
happens for the information content. This is what is done below based only on 
numerical experiments done using the compression algorithm CASToRe. 

The statistical properties of the maps Tz are induced by the behavior of the 
maps near the origin (the non- hyperbolic fixed point). Hence we have to perturb 
T2 changing its behavior near the origin. We have chosen the following family 
of maps: given two positive real numbers a, 6, the maps T(^a.b) are defined by 



where x satisfies x + x"^ + ax^ = 1, and Ti^a,b){x) is randomly chosen in (0, 1) 




— X + x^ + ax^ 



X £ (0,i) 



(21) 
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ln(n) 

Figure 5: EIG between the prime numbers and the different Manneville maps. 
The 6 ~ 0.53 correspond to the T2 map, while the S — 0.33 to the new hybrid 
maps. 

according to the Lebesgue measure when x > x. 

In Figures [6191 the results of the experiments are shown. Each figure is in bi- 
linear scale, and there are plotted the curves corresponding to the information 
content of the string a) (the straight line) and of some different orbits of the 
maps T(^a,b), for (a, 5) equal respectively to (1,3), (0.1,2), (0.5,2) and (1,2). 
From the figures, we obtain as a "good" guess for a dynamical model of the 
series of prime numbers the map 7(0. 5, 2)1 even if in this case the results show 
a big difference for the behavior of the information content of different orbits 
(remember we have only results in mean for intermittent maps with an infinite 
invariant measure). 

On the same line, we have calculated the EIG between the prime numbers 
series and the different perturbed Manneville maps T(^a.b)- The results are shown 
in Figure|SJ in this figure there is the EIG result for T2, with S = 0.53, whereas 
all the perturbed maps give a, S = 0.33, sign of an anti-correlation behavior for 
the residual Shannon entropy. Hence it seems that now the perturbation to 
the Manneville map gives better results from the point of view of compression 
algorithms, but the results seem worse from the point of view of the EIG. 
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3e+06 




Figure 6: The information content measured using CASToRe for the series 
of the prime numbers and ten different orbits of the map T(j^^y The scale is 
bi-linear. 

4 Waiting times and distance between primes 

In the analysis of the statistical behavior of time series, an important feature is 
the distance between two different events. In the orbits of intermittent dynami- 
cal systems the events are the chaotic bursts. The importance of these events for 
the intermittent systems is reflected by the fact that the "best" compression was 
done by simply recording how much time an orbit spends in the laminar zone 
(how many symbols "0" there are in the series) between two different chaotic 
bursts (corresponding to the appearance of the symbol "1"). Analogously, for 
the series of prime numbers the events are the appearance of the symbol "1" , 
that is the fact that an integer is prime. Hence the distance between two differ- 
ent events is the distance between two consecutive prime numbers. The series 
of the distances between consecutive prime numbers has been analyzed with 
different complexity methods, see for example |17| . jl8| and references therein. 

In this section we compare the behavior of the distances between events for 
the prime numbers and the intermittent dynamical systems T^a,2)- 

For prime numbers there are many different results about the growth of 
the distance {pn+i ~ Pn) between consecutive primes. Using the probabilistic 
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Figure 7: The same as in Figure\^ but for the map T'(o.i,2)- 
approach, Cramer gave the conjecture 

1- Pn+l ^ Pn lnn\ 

hmsup 2 ~ 1 (22) 

n^oo log Pn 

To investigate this relation, we have built the distance time series from the 
prime numbers series and we have applied the EIG method. In Figure IIUI we 
shown as once more, the hypothetical trend for the distances between primes, 
given by Cramer conjecture, seems to contain most part of the information, 
yielding & 5 = 0.5, sign of independent fluctuations. 



This result can be rephrased in terms of the time tr needed to have r con- 
secutive "0"s in u. Indeed it holds that tr{Lu) > exp(Y^). 

The analysis of the same quantity t^ was done in jl9| for the family of maps 
2)- The result is that for r big enough 

^-1 < ^ ^ .23) 

r log r 

for some positive constant K, where the mean Ep is done with respect to a prob- 
ability measure p, absolutely continuous with respect to the Lebesgue measure. 
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Figure 8: The same as in Figure\^ but for the map T'(o.5,2)- 

Hence if from the point of view of the information content of the orbits, the 
intermittent map 2^(0.5,2) seems to have, in mean with respect to the Lebesgue 
measure, the same behavior as uj, the results on the distances between consecu- 
tive events show big differences between the two different kinds of intermittency, 
as was suggested by the EIG results. If we also notice that the family of Man- 
neville maps has a phase transition at z = 2, we can conclude that the prime 
numbers are well approximated by an intermittent dynamical system similar to 
Manneville maps only to some extent. 



5 Conclusion 

We have shown the behavior of two complexity methods. Computable Informa- 
tion Content and Entropy Information Gain, on the prime numbers series, with 
an open eye on possible relations with dynamical systems, using theoretical and 
numerical results. 

This paper is supposed only as a tentative to open a different approach to 
study prime numbers, the dynamical approach. Showing at the same time, 
the importance of different complexity notions, that on time series can show 
analogies, but also deep differences, as in this case. 

Moreover we remark that in our approach we looked for a deterministic 
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Figure 9: The same as in Figure\^ but for the map T(^i^2)- 

model for prime numbers, where the model is supposed to be fixed. A different 
approach, similar to the spirit of the analysis in !T2j ^ is to consider a model not 
fixed, but with characteristics varying with time. Our idea is however that to 
guess how the characteristics of the model has to vary with time is a problem 
of the same difficulty of the approximation of the function 7r(n) of the Prime 
Numbers Theorem. 
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